home *** CD-ROM | disk | FTP | other *** search
- >I have written a robot that does this, except it doesn't check for
- >valid SGML -- it just tries to map out the entire web. I believe I
- >found roughly 50 or 60 different sites (this was maybe 2 months ago --
- >I'm sorry, I didn't save the output). It took the robot about half a
- >day (a saturday morning) to complete.
-
- If you do run your robot again I would be very interested if you could
- generate a simple list of document titles and their corresponding
- document id's (or URL's). We have a powerful spires database here,
- interfaced to the web, which we could easily import such a file into to
- great a VERONICA like index of the web. I think that would be pretty
- useful (unless someone is already doing it??).
-
- One other problem to add to you list.....many documents are probably
- only accessible by giving a "keyword" . Unless you can write a robot
- which can successfully guess all possible keywords, you cannot
- gaurantee to be able to traverse the whole web.
-
- Tony
-
-